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Effects-Based  Design  of  Robust  Organizations 

Sui  Ruana,  Candra  Meirinaa,  Haiying  Tua,  Feili  Yua,  and  Krishna  R.  Pattipati® 


Abstract — Effects-based  design  of  robust  organizations 
seeks  to  synthesize  an  organizational  structure  and  its 
strategies  (resource  allocation,  task  scheduling  and  coor¬ 
dination),  to  achieve  the  desired  effect(s)  in  a  dynamically 
changing  mission  environment.  In  this  paper,  we  model  the 
dynamic  system  associated  with  the  mission  environment 
(e.g.,  environment  faced  by  a  joint  task  force  with  a 
military  objective  [8]  or  the  competitive  environment 
confronted  by  a  consumer  electronic  company  striving  to 
increase  its  market  share)  as  a  finite-state  Markov  Decision 
Process  (MDP)[1][2][7].  Using  this  model,  we  determine  a 
near-optimal  action  strategy  that  specifies  which  action  to 
take  in  each  state  of  the  MDP  model  by  Monte  Carlo 
control  method.  The  action  strategy  determines  a  range 
of  possible  missions  the  organization  may  face.  The  range 
of  missions  and  platform  utilization  measures,  in  turn,  are 
used  to  synthesize  a  robust  organizational  structure. 

Keywords:  Organizational  Design,  Markov  Deci¬ 
sion  Processes,  Reinforcement  Learning,  and  Monte 
Carlo  Control  Method. 

I.  Introduction 

A.  Motivation 

Market  environments  and  battlespaces  are  dy¬ 
namic  and  uncertain.  Organizations  seeking  to 
achieve  the  desired  effects  in  such  environments 
are  confronted  with  the  following:  1)  parts  of  the 
environment  can  not  be  controlled  directly;  2)  var¬ 
ious  exogenous  events  may  impact  the  state  of  the 
environment;  3)  the  interactions  between  potential 
organization’s  actions  and  the  dynamics  of  the  envi¬ 
ronment  may  result  in  consequences  that  can  not  be 
predicted  a  priori  with  certainty.  Consequently,  or¬ 
ganizations  need  to  plan  for  potential  contingencies, 
be  flexible  and  adaptable.  In  other  words,  they  need 
to  be  robust  learning  organizations.  In  this  paper, 
we  apply  concepts  from  Markov  Decision  Processes 
(MDP),  reinforcement  learning,  Monte  Carlo  con¬ 
trol  method,  and  mixed  integer  optimization,  as  in 
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[l]-[5]  and  [16],  to  prescribe  an  optimal  decision 
strategy  and  the  concomitant  organization  structure 
to  achieve  desired  effects  in  an  uncertain  mission 
environment. 

Over  the  years,  research  in  organizational 
decision-making  has  demonstrated  that  a  strong 
functional  dependency  exists  between  the  specific 
structure  of  the  mission  environment  and  the 
resulting  optimal  organizational  structure  and 
its  decision  strategy.  That  is,  the  optimality  of 
an  organization  design  depends  on  the  mission 
environment  and  the  organizational  constraints  [15]. 
Such  organizations,  termed  congruent,  are  expected 
to  perform  well.  Incongruence  to  environmental 
dynamics  hinders  the  organization’s  ability  to 
achieve  the  desired  effects,  results  in  a  higher  cost, 
or  likely  to  encounter  adverse  effects. 

Agility  is  arguably  one  of  the  most  important 
characteristics  of  successful  information  age  or¬ 
ganizations  [8],  Agile  organizations  do  not  just 
happen.  They  are  the  results  of  an  organizational 
structure,  command  and  control  approach,  concepts 
of  operation,  supporting  systems,  and  personnel  that 
have  a  synergistic  mix  of  the  right  characteristics. 
Agile  organization  is  a  synergistic  combination  of 
the  following  six  attributes:  Robustness,  Resilience, 
Responsiveness,  Flexibility,  Innovation  and  Adapta¬ 
tion  [8],  Robustness  is  the  ability  to  retain  a  level  of 
effectiveness  across  a  range  of  missions  that  span 
the  spectrum  of  conflicts,  operating  environments, 
and/or  circumstances.  An  organization  designed  to 
cope  with  dynamic  and  stochastic  mission  environ¬ 
ments  is  said  to  be  robust  in  the  sense  that  it  can 
cope  with  a  range  of  contingencies. 

B.  Related  Research 

Over  the  past  several  years,  mathematical 
and  computational  models  of  organizations  have 
attracted  a  great  deal  of  interest  in  various  fields  of 
scientific  research  [15].  Many  research  efforts  have 
focused  on  the  problem  of  quantifying  the  structural 
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(mis)match  between  organizations  and  their  tasks. 
When  modeling  a  complex  mission  and  designing 
the  corresponding  organization,  the  variety  of 
mission  dimensions  (e.g.,  functional,  geographical, 
terrain),  together  with  the  required  depth  of  model 
granunality,  determine  the  complexity  of  the  design 
process  [14].  Model-driven  synthesis  of  optimized 
organizations  for  a  specific  (deterministic)  mission 
environment  have  been  widely  studied  by  the 
operations  research  community  [12]  [13]. 

On  the  other  hand,  planning  and  machine  learning 
in  uncertain  and  dynamic  environments  have  been 
largely  studied  by  the  control  and  artificial  intelli¬ 
gence  community.  In  this  vein,  reinforcement  learn¬ 
ing,  a  computational  approach  for  understanding 
and  automating  goal-directed  learning  and  decision¬ 
making  [l]-[4],  has  become  a  dominant  approach 
for  dealing  with  stochastic  planning  problems. 

C.  Scope  and  Organization  of  Paper 

In  section  II,  we  provide  an  overview  of  our 
organization  methodology  based  on  MDP,  Monte 
Carlo  control  method,  reinforcement  learning,  and 
mixed  integer  optimization  techniques.  In  section 
III,  we  formulate  the  dynamic  environment  as  a 
finite  state  MDP,  and  formalize  the  objectives  of 
the  organizational  design  problem.  In  section  IV, 
we  apply  Monte  Carlo  control  method  to  prescribe 
near-optimal  action  strategies  for  the  MDP  model. 
Section  V  provides  an  integer  programming  formu¬ 
lation  for  the  congruent  organization  design  prob¬ 
lem.  Simulation  results  are  presented  in  Section  VI. 
Finally,  the  paper  concludes  with  a  summary  and 
future  research  directions  in  section  VII. 

II.  Organizational  Design  Methodology 

The  methodology  applied  in  this  paper  is  shown 
in  Fig.  1.  In  this  study,  we  are  considering  a 
dynamic  military  mission  environment,  and  the 
design  of  a  robust  organization  for  accomplishing 
the  mission.  The  mission  is  represented  via  tasks 
and  platforms  (assets).  There  are  three  type  of 
tasks  considered  in  this  paper,  viz.,  mission  tasks, 
time  critical  tasks,  and  “mosquitoes”  (trivial  tasks). 
Mission  tasks  are  those  that  must  be  executed,  are 
known  in  advance,  and  typically  have  precedence 
restrictions  in  the  form  of  a  dependency  graph 
[9]  [10],  Time  critical  tasks  are  those  whose 


occurrence  is  uncertain,  are  time  sensitive,  and  may 
have  substantial  impact  on  the  mission.  Mosquitoes 
are  the  relatively  trivial  tasks  whose  occurrence  is 
highly  uncertain,  but  have  insignificant  impact  on 
the  mission.  Each  task  is  characterized  by  a  set 
of  resource  requirements  [12],  A  platform  (asset) 
represents  a  physical  entity  of  an  organization 
that  provides  resource  capabilities  used  to  process 
the  tasks.  Each  platform  has  a  specific  resource 
capability.  Successful  task  execution  requires 
that  the  task’s  resource  requirements  are  met  by 
the  overall  resource  capabilities  of  the  platforms 
allocated  to  that  task. 

We  model  the  dynamic  environment  as  a  finite 
state  Markov  Decision  Process  (MDP)  [3]  [4]  [7]. 
The  MDP  is  characterized  by  its  state  and  action  sets 
and  by  the  the  transition  probability  matrix.  Given 
any  state  and  action,  pair  (s,a),  the  probability  that 
the  next  state  is  s'  is  given  by 

Pass’  =  Pr{st+ 1  =  s'\st  =  s,at  =  a}  (1) 

We  assume  that  the  structure  of  MDP  model  of 
the  environment  is  known,  but  the  environmental 
parameters  (  e.g.,  the  transition  probabilities  ) 
are  unknown.  Since  the  model  parameters  are 
unknown,  they  need  to  be  estimated  (“learned”) 
[3];  the  estimated  parameters  enables  us  to  find 
a  near-optimal  action  strategy  that  optimizes  an 
objective  function. 

Monte  Carlo  control  methods  [3]  [6]  are  em¬ 
ployed  to  obtain  a  near-optimal  action  strategy. 
Monte  Carlo  methods  require  only  experience-based 
sample  sequences  of  states,  actions,  and  rewards 
from  an  on-line  or  simulated  interaction  with  the 
environment.  Learning  from  on-line  experience  is 
striking  because  it  requires  no  prior  knowledge  of 
the  environment’s  dynamics,  and  yet  can  still  attain 
optimal  behavior  asymptotically.  The  near-optimal 
strategy  from  Monte  Carlo  control  method  enables 
us  to  compute  the  platform  (resource)  utilization 
measures  of  the  near-optimal  strategy.  These  mea¬ 
sures  are  used  to  synthesize  an  organization  that 
implements  the  near-optimal  action  strategy. 

III.  MDP  Formulation  and  Organization 
Design  Objectives 

The  dynamic  stochastic  mission  environment 
consists  of: 
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Fig.  1.  Robust  Organization  Design  Process 


•  Effect  set:  M  =  {rai,ra2,  the  desired 

effects,  with  some  serving  as  the  end  goals. 
There  can  be  dependencies  among  the  effects. 

•  Exogenous  event  set:  E  =  {ei,  e2, ...,  e/}, 
which  represents  uncontrollable  random  events 
in  the  environment. 

•  Action  set:  A  =  {ai,  a2, ...,  o^},  which  denotes 
controllable  influences  to  achieve  the  desired 
effects,  and  minimize  the  adverse  effects  of 
exogenous  events.  For  each  action  a,  there  is  a 
cost  c(a )  associated  with  it. 

When  applying  this  modeling  approach  to  a  C2 
mission  environment  [9]  [10],  the  mission  tasks  cor¬ 
respond  to  the  effects,  and  the  time  critical  tasks  and 
“mosquito”  tasks  correspond  to  exogenous  events. 
The  actions  correspond  to  the  asset  allocation  used 
to  achieve  the  desired  effects  and  suppress  the 
effects  of  exogenous  events. 

Organization  is  a  team  of  Decision  Makers 
(DM),  ORG  =  {DMi,  DM2, ...,  DMd},  i.e.,  the 
personnel  or  automated  systems  that  supervise  the 
actions  in  the  system.  DMs  coordinate  their  infor¬ 
mation,  resources,  and  activities  in  order  to  achieve 
their  common  goal  in  the  dynamic  and  uncertain 
mission  environment  [12].  The  constraint  imposed 
on  each  DM  is  in  the  form  of  its  limited  resource 
handling  capability.  In  this  paper,  we  experiment 
with  the  resource  handling  capability  of  each  DM 
as  a  workload  threshold.  The  DM’s  workload  is 


a  combination  of  internal  workload  and  external 
workload  [12]  [13]. 

The  dynamic  mission  environment  is  modeled 
as  a  finite  state  Markov  Decision  Process  ( MDP ). 
The  main  sub-elements  of  the  MDP  follow  the 
mission  characteristics  cited  earlier.  They  are  as 
follows: 

•  States:  S  =  {si,  s2, ...,  sz} 

-  Each  state  represents  the  status  of  effects 
and  exogenous  events:  s*  =  (Adi,  E,t) 
where  Mi  C  M  denotes  the  achieved 
effects  and  Ei  C  E  denotes  the  existing, 
but  unmitigated,  exogenous  events. 

-  The  initial  state,  sb  =  (0,0),  where  no 
effect  has  been  achieved  and  no  exoge¬ 
nous  event  has  appeared  yet;  the  terminal 
states,  Se  C  S,  represent  the  attainment  of 
desired  end  effects. 

-  The  state  has  Markov  property;  that  is  the 
next  state  depends  solely  on  its  previous 
state  and  action,  as  in  Eq.  (1). 

•  Actions:  A  =  {ai,  a2, ...,  ak}. 

•  Reward  mechanism: 

-  Reward  :  When  an  end  state  is  reached,  a 
fixed  reward  r(se)  >  0  is  accrued  by  the 
organization. 

-  Penalty  :  When  undesirable  end  effects  are 
reached,  a  penalty  r(sh )  <  0  is  imposed  on 
the  organization. 
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-  Action  cost:  Whenever  an  action  o*  e  A 
is  pursued,  a  cost  c(oj)  is  incurred. 

The  objectives  of  the  design  problem  are  to: 

1)  Find  an  optimal  closed-loop  action  strategy, 
viz.,  a  mapping  from  state  to  action,  to  max¬ 
imize  the  expected  net  reward. 

2)  Design  an  organization,  i.e.,  the  allocation  of 
platforms  to  decision  makers,  such  that  the 
overall  workload  of  the  organization  is  min¬ 
imized,  subject  to  constraints  on  each  DM’s 
workload  capability. 

The  MDP  problem  posed  above  is  a 
stochastic  optimal  control  problem[l][2].  Dynamic 
programming  (DP)  [2]  is  widely  used  to  characterize 
the  optimal  solution.  However,  it  suffers  from 
“the  curse  of  dimensionality”,  meaning  that  its 
computational  requirements  grow  exponentially 
with  the  number  of  state  variables.  In  addition, 
DP  needs  the  complete  parameters  of  the  model. 
However,  in  real  world  applications,  the  transition 
probabilities  are  rarely  known.  Instead,  we  could 
gain  knowledge  of  the  system  from  sampled  data, 
and  train  the  strategy.  We  propose  Monte  Carlo 
control  methods  [3]  [6]  to  achieve  the  objective 
of  obtaining  a  near-optimal  strategy  for  the  MDP 
model  without  a  complete  knowledge  of  the  MDP 
parameters  (i.e.,  when  the  transition  probabilities 
are  not  known).  Monte  Carlo  method  requires  only 
that  the  MDP  model  generate  a  set  of  sample 
state  transitions,  but  not  the  complete  probability 
distributions  of  all  possible  state  transitions.  The 
Monte  Carlo  method  learns  these  quantities  from 
an  on-line  simulation.  This,  in  turn,  enables  us 
to  obtain  an  optimal  strategy  [3]  without  prior 
knowledge  of  the  environment’s  dynamics.  An 
agent-based  simulator,  such  as  the  one  in  [11],  can 
provide  a  vehicle  to  operationalize  various  mission 
episodes  utilized  in  the  Monte  Carlo  method. 

Platform  (resource)  utilization  measures  of  the 
near-optimal  strategy  are  employed  to  design  an 
organizations  that  is  congruent  with  the  mission  en¬ 
vironment.  In  this  paper,  the  expected  total  workload 
of  the  organization  is  minimized.  An  organization 
tuned  to  such  a  strategy  is  robust  in  the  sense  that 
it  covers  a  range  of  missions  that  are  most  likely  to 
materialize  in  the  dynamic  environment. 


IV.  Monte  Carlo  Method  for 
Near-Optimal  Action  Strategy 

Monte  Carlo  methods  are  ways  of  solving  the 
reinforcement  learning  problem  based  on  averaging 
sample  returns  [3]  [6],  In  Monte  Carlo  methods, 
we  assume  that  the  experience  is  divided  into 
episodes,  and  that  all  episodes  eventually  terminate 
regardless  of  which  actions  are  selected.  It  is  only 
upon  the  completion  of  an  episode  that  value 
estimates  (estimation  of  the  expected  reward  of 
state,  and  state-action  pair)  and  policies  (strategy, 
the  mapping  from  states  to  actions)  are  changed. 
Monte  Carlo  methods  are  thus  incremental  in 
an  episode-by-episode  sense  [3]  [6],  but  not  in 
a  step-by-step  sense.  The  term  ’’Monte  Carlo” 
is  often  used  more  broadly  for  any  estimation 
method  whose  operation  involves  a  significant 
random  component.  Here,  we  use  it  specifically  for 
methods  based  on  averaging  complete  returns. 

Monte  Carlo  methods  could  be  applied  to  eval¬ 
uate  state-value  of  any  given  strategy,  viz.,  the 
mapping  of  states  to  actions,  by  averaging  the 
returns  from  sample  episodes.  Starting  from  Monte 
Carlo  policy  evaluation,  it  is  natural  to  alternate 
between  evaluation  and  improvement  on  an  episode- 
by  episode  basis.  After  each  episode,  the  observed 
returns  are  used  for  policy  evaluation,  and  then  the 
policy  is  improved  at  all  the  states  visited  in  the 
episode  [3],  In  addition,  Monte  Carlo  methods  are 
particularly  attractive  when  one  requires  the  value  of 
only  a  subset  of  the  states.  One  can  generate  many 
sample  episodes  starting  from  these  states,  averag¬ 
ing  returns  only  from  these  states,  and  ignoring  all 
others. 

The  complete  Monte  Carlo  control  algorithm, 
which  combines  Exploring  Starts  (ES)  and  e-greedy, 
is  given  in  Figure  2.  The  ES  technique  starts  an 
episode  by  randomly  choosing  the  initial  state.  The 
concept  of  e  —  greedy  means  that  most  of  the  time 
an  action  that  has  maximal  estimated  action  value  is 
chosen,  but  with  probability  e  an  action  is  selected 
at  random.  That  is,  all  non-greedy  actions  are  given 
the  minimal  probability  of  selection,  pAyy,  and  the 
remaining  bulk  of  the  probability,  1  —  e  +  ppyy, 
is  given  to  the  greedy  action  [3],  where  |A(s)j  is 
the  cardinality  of  action  set,  A(s)  in  state  s.  This 
enables  the  Monte  Carlo  control  method  to  get  out 
of  local  minima. 
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Monte  Carlo  Control  Algorithm 
(Monte  Carlo  ES  and  e-greedy  combined) 

Initialize: 

Vsj  G  S,  cl  G  A(sj) 

•  Q(si,  a)  (value  function  of  and  a)  <—  arbitrary  positive  real 
number, 

•  7r(sj)  (state  action  mapping  function  of  s*)  arbitrary  action, 

•  Returns(si,a )  empty  list 

Repeat 

1)  Generate  an  episode,  p,  using  exploring  starts  [3]  and  ir. 

2)  For  each  pair  s*,  a  appearing  in  the  episode  p: 

•  R  <—  return  following  the  first  occurrence  of  sL,  a. 

•  Append  R  to  Returns(si ,  a) 

•  Q(si,a )  <—  average(Returns(si,a )) 

3)  For  each  s  in  the  episode: 

7r(sj)  argmaxa  Q(si,  a)  with  probability  1  —  e 
7r(sj)  rand{A(si ))  with  probability  e 

U(s;)  <-  max„  Q(si,  7r(sj)) 

4)  e^O 

Until  V(si),Vsi  G  S  converge,  i.e.,  E Sies  I  (a<)  I  ^ 


Fig.  2.  Monte  Carlo  control  method  with  Exploring  Starts  and  e-greedy 


Note  that,  applying  a  simple  Monte  Carlo  control 
method,  based  on  only  exploring  starts  [3],  is  not 
satisfactory,  i.e.  some  states  are  rarely  visited, 
while  some  other  states  form  cycles.  An  e  —  greedy 
[3]  method  is  often  combined  with  the  ES  method. 
A  fixed  e  is  not  efficient.  A  large  e  yields  slow 
convergence  and  tends  to  oscillate;  with  a  small  e, 
it  is  difficult  to  eliminate  the  possible  state  cycles. 
Typically,  one  chooses  e,  — »  0  such  that  E,:  G:  — >  oo. 
For  example,  with  et  =  N  —  (2, 6),  the  results 
are  satisfactory. 

In  the  Monte  Carlo  control  algorithm,  all  the 
returns  for  each  state-action  pair  are  accumulated 
and  averaged,  irrespective  of  which  policy  was  in 
force  and  when  they  were  observed.  Convergence 
of  the  Monte  Carlo  control  method  is  assured 
because  the  changes  to  the  action-value  function 
decrease  over  time.  A  formal  proof  has  not  yet 
appeared,  and  it  is  one  of  most  fundamental  open 
questions  in  reinforcement  learning  [3]  [4], 


V.  Organization  Design 

In  a  C2  environment  [10],  each  mission  task 
can  be  modeled  as  a  desired  effect,  while  time 
critical  tasks  and  “mosquito”  tasks  can  be  viewed 
as  exogenous  events.  A  Task  is  an  activity  that 
entails  the  use  of  relevant  resources  (provided 
by  organization’s  platforms)  and  is  carried  out 
by  an  indvividual  DM  or  a  group  of  DMs  to 
accomplish  the  mission  objectives  [12].  Each  task 
Ti(i  =  1, ...,  N)  has  resource  requirements  specified 
by  the  row  vector  [Rn,  Ri2, ...,  Rn\-  A  platform  is 
a  physical  asset  of  an  organization  that  provides 
resource  capabilities  and  is  used  to  process  tasks. 
Each  platform  belongs  to  a  unique  platform  class. 
For  each  platform  class  Pm  (in  =  1  we 

define  its  resource  capability  vector  [rmi,  ...rmL], 
where  rmi  specifies  the  number  of  units  of  resource 
type  l  available  on  platform  class  Prn  [12].  We  have 
assume  that  the  number  of  platform  classes  and 
the  available  platforms  in  each  platform  class  is 
known.  In  the  MDP  model,  each  action  a*  is  a  set 
of  platforms  P(a;)  =  [xa,  xi2, xim\,  specifying 
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the  numbers  of  platforms  of  each  class  needed  to 
pursue  the  action.  The  workload  of  each  decision 
maker  in  the  organization  embodies  the  activity 
of  supervising  the  platforms  and  coordinating  the 
platforms  with  other  decision  makers  to  pursue  the 
action  collaboratively  to  achieve  the  desired  effect 
and  mitigate  the  effects  of  exogenous  events. 

Starting  from  the  near-optimal  strategy,  we  can 
design  the  organization  structure  which  is  congruent 
to  this  strategy.  In  this  paper,  we  only  study  the 
organization  as  the  ownership  of  each  platform 
of  every  platform  class,  so  that  the  expected  total 
workload  of  the  organization  is  minimized  with 
each  decision  maker’s  workload  constraint  satisfied. 

Let  [n,i,...,nm]  be  the  available  numbers  of 
platforms  for  platform  classes  [Pi, ....  Pm],  Let 
[xji, ...,  xim]  be  the  numbers  of  platforms  of  each 
platform  class  for  action  ctj.  Define  [Xi: 
as  the  numbers  of  platforms  for  any  action.  Since 
the  occurrence  of  actions  in  the  optimal  strategy  is 
probabilistic,  so  is  [Xll ...,  Xm], 

Let  yki  be  the  the  number  of  platforms  of 
class  Pi  allocated  to  DMk.  For  the  organization 
ORG  =  (DMu...,DMd),  the  workload  WLk  of 
DMk,  k  e  {1, ...,  D}  is  approximated  as: 


term  EZi  E?=i[Vki(%~Vki)  E(xixj)]  quantifies  the 
external  workload,  i.e.,  the  coordination  effort  of 
DMk  cooperating  with  other  DMs  in  the  organi¬ 
zation  to  execute  tasks.  The  external  workload  on 
DMk  incurred  by  the  platform  classes  P,  and  P3  is 
proportional  to 

-  joint  expectation  of  numbers  of  platforms  of 
classes  Pj  and  Pj,  i.e.,  E(XiXj); 

-  the  proportion  of  platforms  of  class  P,  that 
DMk  owns,  i.e., 

Tli 

-  the  proportion  of  platforms  of  class  Pj  that 
DMk  doesn’t  own,  i.e.,  EzmA 

Here,  the  higher  order  expectations  of  joint  numbers 
of  platform  classes,  e.g.,  third  order  expectation 
E(XiXjXw),  i,j,  w  e  {1,. are  ignored. 
User  specified  constants  a  and  (3  define  the  weights 
on  internal  and  external  workloads. 


Thus,  the  total  workload  of  all  DMs  is: 

m 

WLall=a^E(Xi) 

2=1 

'52k= 1  HkiVkj  ' 


+/?£D(i- 

2=1  j  =  1 


TiiTij 


■)E(X,Xi)} 


(3) 


The  optimization  problem  associated  with 
organizational  design  is  as  follows: 


WLk  =  aJ2—E(Xi) 

U.  ni 

+ef:f:[y*{n’-yki)E{xixj)] 


2=1  j  =  1 


TiiTij 


\/k  =  1, ...,  D 


(2) 


where  ET=i  y^LE{Xi)  quantifies  the  internal 
workload  of  DMk,  which  accounts  for  the  super¬ 
vision  of  platforms  that  DMk  owns.  The  internal 
workload  of  DMk  incurred  by  platform  class  Pi  is 
proportional  to 

-  expected  number  of  platforms  of  class  Pj,  i.e., 

E{Xi)\ 

-  the  proportion  of  platforms  of  class  Pj  that 
DMk  owns,  i.e.,  ykL; 

Tli 

That  is,  the  larger  is  the  value  of  E{X j,  the 
higher  is  the  workload  imposed  on  platform  class 
Pi;  the  larger  the  proportion  of  platforms  of  class 
Pi  owned  by  decision  maker  DMk,  i.e.  — ,  the 
higher  is  the  workload  from  class  Pj  on  DMk.  The 


objective  :  min  WLaU  =  a E(Xi) 

+PLT,i  EJLi[(i  -  )E(x,xj)} 

(4) 


s.t. 


WLk  =  aYZi  ^E(Xi) 


+PYZL 1  Y.%i\mx^E{XiXj)]  <  Ck 


\/k  =  1 


Ek=i  Vki  =  ni  Vi  =  1, -,m 
yki  E  I+  Mk  =  1,  ...,D  Vi  =  1, ...,  m  (5) 


Note  that  E^Xj  is  the  expected  number  of 
platforms  of  class  Pj,  and  E(XiXj)  is  the  joint 
expectation  of  numbers  of  platform  classes  Pi  and 

Py 

These  expectations  are  approximated  by  their 
sample  means: 

Xj  «  E{Xi) 

XiXj  «  E(XiXj),  Vi,  j  e  {!,... ,m}  (6) 
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This  problem,  defined  in  Eq.(5),  is  typically  a 
nonlinear  constrained  integer  optimization  problem, 
which  is  NP-hard.  We  applied  MINLP  (Mixed 
Integer  Nonlinear  Programming)  algorithms  in 
T0MLAB™[\1 ]  optimization  software  to  solve 
the  above  integer  programming  problem.  The  solver 
package  minlpBB  for  sparse  and  dense  mixed- 
integer  linear,  quadratic  and  nonlinear  programming 
was  employed  for  this  problem. 

VI.  SIMULATION 

A.  Simulation  Model 

We  illustrate  our  approach  to  effects-based 
robust  organization  design  using  a  simple  joint- 
task-force  scenario  that  can  be  operationalized  in 
the  distributed  dynamic  decision-making  (DDD-III) 
war-gaming  simulator  [9].  In  the  example  system, 
there  are  four  platform  classes,  i.e.,  Pi,  P2,  P3, 
P4,  as  defined  in  Table  I,  and  three  mission  tasks 
(desired  effects),  i.e..  Mi,  M2,  M3,  and  three  time 
critical  tasks  (exogenous  events),  i.e.,  Tj,  T2,  T3, 
as  defined  in  Table  II.  The  dependencies  among 
mission  tasks  are  shown  in  Fig.  3. 

The  following  three  types  of  resources  are  con¬ 
sidered: 

-  ASUW  -  Anti-SUbmarine  Warfare 

-  STK  -  STriKe  warfare 

-  SOF  -  Special/ground  Operations 

Rules  of  the  simulation  model,  which  are  un¬ 
known  to  the  learning  agent,  are  as  follows: 

-  All  mission  tasks  should  be  executed  by  satis¬ 
fying  the  dependency  constraints; 

-  All  time  critical  tasks,  if  they  ever  appear,  have 
to  be  completed  before  the  final  mission  task 
is  completed; 

-  Each  resource  contributes  to  the  task  accuracy; 
the  more  resources  are  allocated  to  a  task,  the 
higher  is  the  probability  of  completing  the  task. 

-  If  the  final  mission  task  is  achieved,  the  game 
is  won  and  a  reward  of  5,000  units  is  earned; 

-  If  there  are  three  time  critical  tasks  existing  in 
the  system,  the  game  is  lost  and  a  penalty  of 
3,000  units  is  incurred. 

B.  Monte  Carlo  Simulations 

By  using  the  Monte  Carlo  control  method  as 
shown  in  Fig.  2,  we  obtain  a  near-optimal  (sta¬ 
ble)  action  strategy  via  85000  runs  (episodes).  The 


TABLE  I 
Platforms 


Platform 

Name 

Number 

ASUW 

STK 

SOF 

Cost 

Pi 

F18s 

3 

0 

2 

0 

100 

P2 

FAB 

5 

1 

0 

0 

80 

Ps 

FOB 

3 

1 

1 

1 

160 

Pa 

SOF 

2 

0 

0 

1 

60 

Reward  (Win) 

5000 

Penalty  (Lose) 

-3000 

TABLE  II 

Missions  and  Time  Critical  Tasks 


Mission 

Name 

ASUW 

STK 

SOF 

Mi 

NB  -  Naval  Base 

2 

0 

2 

m2 

AB  -  Air  Base 

0 

6 

0 

m3 

PRT  -  Sea  Port  (final  goal) 

2 

2 

0 

Ti 

SCUD  -  missile  launcher 

1 

1 

0 

t2 

PH  -  Hostile  ship 

1 

0 

1 

Ts 

TSK  -  complex  ground  task 

0 

1 

1 

optimal  action  strategy,  i.e.,  the  mapping  of  states 
to  actions,  viz.,  the  combinations  of  platforms, 
and  values  of  states,  are  as  listed  in  Table  III. 
The  comparison  of  net  rewards  from  the  near- 
optimal  strategy  obtained  by  Monte  Carlo  control 
method  and  a  randomized  greedy  strategy  from 
2000  runs  are  shown  in  Fig.  4.  The  greedy  strategy 
chooses  randomly,  at  each  state,  one  of  the  five 
most  economically  feasible  platform  combinations. 
After  10000  runs  (sample  episodes),  the  average  net 
reward  of  the  near-optimal  strategy  is  2985,  while 
the  average  net  reward  of  the  greedy  strategy  is 
only  2127.  Thus,  the  near-optimal  strategy  provides 
a  40%  better  reward  than  the  greedy  strategy. 

C.  Organization  Design  Results 

The  platform  utilization  statistics,  i.e.,  the  sample 
mean  of  the  numbers  of  platforms  of  each  class 

Mi  :  Naval  Base 


M2  :  Air  Base 

Fig.  3.  Effect  Dependency 


Near-Optimal  Strategy  (Histogram) 


Fig.  4.  Comparison  of  Net  Reward  Distributions 


TABLE  III 

Near-Optimal  Strategy 


State 

Action  (Platforms) 

State  Value 

Pi 

P2 

Ps 

Pi 

0 

1 

1 

1 

3 

2777.72 

Mi 

3 

0 

0 

0 

3008.48 

m2 

0 

2 

0 

2 

3565.36 

Ti 

1 

2 

0 

1 

2115.63 

t2 

0 

2 

0 

2 

2174.47 

t3 

1 

0 

0 

1 

2291.07 

Ti,  T2 

1 

1 

0 

0 

1796.93 

Mi,  Ti 

3 

0 

0 

0 

2270.26 

Mi,  T2 

0 

2 

1 

2 

2278.97 

Mi,  T3 

3 

0 

0 

0 

2100.65 

M2,  Ti 

0 

2 

0 

2 

2407.18 

Mi,  Ti,  T2 

1 

0 

1 

1 

1693.73 

M±,  M2 

1 

2 

0 

0 

3221.11 

M±,  M2 ,  Ti 

1 

2 

0 

1 

2358.82 

Greedy  Strategy  (Histogram) 


Net  Reward 


TABLE  IV 

Platform  Utilization  Statistics 


Pi 

p2 

Ps 

Pi 

Xi 

1.2 

2.3 

3.1 

2 

XiXj 

Pi 

p2 

Ps 

Pi 

Pi 

1.5 

0.75 

0.25 

1.00 

p2 

0.75 

1.916 

0.25 

1.58 

Ps 

0.25 

0.25 

0.333 

0.667 

Pi 

1.00 

1.583 

0.667 

2.667 

TABLE  V 

Decision  Maker  Platform  Ownership 


DM 

Pi 

p2 

Ps 

Pi 

Expected 

Workload 

(a  =1,0=1) 

Workload 

Constraint 

DMi 

1 

3 

1 

1 

5.7065 

8 

DM2 

1 

1 

1 

1 

4.9265 

6 

DM3 

1 

1 

1 

0 

3.3766 

6 

Xi,  i  e  {1, ...,  rn}  and  the  sample  mean  of  joint 
numbers  of  platform  class  pairs  XtXj ,  i,j  e 
are  listed  in  Table  IV.  Using  MINLP 
algorithms  in  TOMLAB™ ,  we  can  obtain  the 
congruent  organization  for  the  near-optimal  strategy 
as  shown  in  Table  V. 

VII.  Conclusion  and  Future  Work 

In  this  paper,  we  proposed  a  methodology  for 
designing  a  robust  organization  for  stochastic  and 
dynamic  environments.  The  dynamic  environment 
can  be  modeled  as  a  finite  state  Markov  Decision 
Process.  Using  Monte  Carlo  control  methods, 
a  near-optimal  action  strategy  is  obtained.  An 


organization  congruent  to  this  strategy  is  designed 
by  solving  an  integer  optimization  problem. 

Simulation  results  support  the  conclusion  that 
the  Monte  Carlo  control  methods  are  effective 
in  achieving  the  near-optimal  strategies  in  real 
world  applications,  where  the  system  parameters 
are  uncertain.  Formulation  of  organizational  design 
problem  and  mix-integer  optimization  algorithms 
provide  nice  vehicles  to  realize  the  design  of 
organizations  that  are  congruent  with  their  dynamic 
and  uncertain  environments. 

We  are  pursuing  future  research  along  the  follow- 


9 


ing  directions: 

-  Incorporate  realistic  mission  environments  into 
the  MDP  model. 

-  Include  additional  organizational  structure  ele¬ 
ments  into  the  design  process,  e.g.,  command 
structure,  information  flow  structure. 

-  Study  the  mechanisms  of  organizational  adap¬ 
tation  via  agent-based  simulations. 
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Monte  Carlo  Control  Method  " 

^  Illustrative  Example 
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Summary  and  Future  Work 


Design  Problem 

y  - 


Effects-Based  Design  of  Robust  Organizations 


Objective: 

Design  robust  organizational  structures  and  strategies  to 
account  for  a  dynamically  changing  mission  environment 


Methodology: 

Mission  Model:  Finite-state  Markov  Decision  Process 
♦  Methods: 

►  Robust  strategies 

Monte  Carlo  Control  Methods 

►  Robust  structures 

Mixed  Integer  Nonlinear  Programming 
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Design  Methodology 


Robust  Organization  Design  Methodology: 


Organization 


ou 


Modeling  Mission  Environment  1/3 


Characteristics  of  dynamic  and  stochastic  environments: 

►  Parts  of  the  environment  cannot  be  controlled  directly 

►  Various  exogenous  events  may  impact  the  environment 

►  Consequences  of  actions  cannot  be  predicted  a  priori  with 


certainty 


Regs,  for  organizations  coping  with  stochastic  environments: 

►  Plan  for  potential  contingencies 

►  Maintain  Congruent  with  the  dynamic  mission  environment 

►  Be  Robust 


Modeling  Mission  Environment  2/3 


Dynamic  Stochastic  Mission  Environment: 

►  Effects:  the  desired  effects,  with  some  serving  as  the  end  goals 

►  Exogenous  events:  uncontrollable  random  events 

►  Actions:  controllable  influences  to  achieve  the  desired  effects, 
and  minimize  the  adverse  effects  of  exogenous  events 


Organization: 

A  team  of  Decision  Makers  (DM) 

►  Human  or  automated  system 

►  Limited  resource  handling  capability  (workload  threshold) 
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—  Modeling  Mission  Environment  3/3 

Command  Control  Mission  Environment  and  Organization 
Task: 


►  Resource  Requirement  Vector: 


Mission  Tasks 

ddi - >  C^Effects^^ 

Time  critical  tasks 

nnr - (  Exoaenous  ^ 

V^Events 

“Mosquito”  tasks 

Platform  (Asset): 

►  Resource  Capability  Vector:  [rml,rm2,...,rm L\ 


Asset-Task 

Allocation 


Actions 


Organization: 

►  Ownership  of  Platforms 


MDP  for  C2  Mission  Environment 


Markov  Decision  Process  for  C2  Mission  Environment 


States: 


S  =  {svs2,...,sz} 


►  Status  of  effects  and  exogenous  events: 


c=  M  Achieved  effects 
E.^E  Unmitigated  exogenous  events 


si  =  (M/fE,) 


Actions:  A  =  {ax,a2,...,ak} ,  Platform  to  task  allocation 


Transition  Probability  Matrix: 

Reward  Mechanism: 


►  Reward:  desired  end  effect  is  reached  rOe)>0 

►  Penalty:  undesirable  end  effects  are  reached 

►  Cost:  action  is  pursued  C(ai)>  0 


Optimal  Action  Strategy: 

►  Mapping  from  states  to  actions,  maximizing  the  expected  net  reward 
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Illustrative  Example 


Platform 

Name 

Number 

ASUW 

STK 

SOF 

Cost 

Pi 

F18S 

3 

0 

2 

0 

100 

P2 

FAB 

5 

1 

0 

0 

80 

P3 

FOB 

3 

1 

1 

1 

160 

P4 

SOF 

2 

0 

0 

1 

60 

Reward  (Win) 

5000 

Penalty  (Lose) 

-3000 

Task 

Name 

ASUW 

STK 

SOF 

M1 

Naval  Base 

2 

0 

2 

m2 

Air  Base 

0 

6 

0 

m3 

Sea  Port  ( final) 

2 

2 

0 

SCUD  -  missile 

1 

1 

0 

t2 

Hostile  ship 

1 

0 

1 

t3 

TSK  -complex 
group  task 

0 

1 

1 

^jtoltig^edysmtOfftimal  Action  Strategy  th  probability  of  1  -  e 

►  Mapping  from  s?afe^^&W&te,Q9taHtatoM(pthe  expected  net  reward 


State 


St(T2) 


Action 


a1(<2P2+2P4>  ->m1) 


a2  (<3P1>  -*m2) 


a3  (<P3>  ->T2) 


S-A  Value 


1560 


2000 


1320 


Exploration  method  for  finding  a  start  state: 

Episode  starts  from  a  randomly  selected  initial  state 


Legendary 

O  -  Mission  Task 

^  -  Time  Critical 
Task 

-y-  -  “Mosquitoes” 
■  -  Asset 
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Monte  Carlo  Control  Method  -  2/4 

— 


state  s2 


State 

Action 

S-A  Value 

S2(M2,T2) 

a1(<2P2+2P4> 

1200 

as(<P3>  ^T2) 

700 

a4(<P2+P4>  -»T,) 

1020 

a5(<P3>  -^T.,, 
<P2+P4>->T2) 

1400 

Legendary 


O  -  Mission  Task 


★ 

♦ 


-  Time  Critical 
Task 

-  “Mosquitoes” 


□  -  Asset 
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Monte  Carlo  Control  Method  -  3/4 

— 


Update 


state-action  value 


State 

Action 

S-A  Value 

a1(<2P2+2P4> 

1560 

S,(T2) 

a2(<3P1>  ->m2) 

2170 

a3(<P3>  ^T2) 

1320 

a1(<2P2+2P4> 

1200 

s2(m2,t2) 

^3  (<^3>  ^"^"2) 

700 

a4(<P2+P4>  ^T,) 

1020 

a5(<P3> 

<P2+P4>->T2) 

2800 

— 

— 

— 
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Converged  state-action  values 


State 

Action 

State-Action 

Value 

t2 

ai 

2115 

a2 

1780 

a3 

930 

ai 

323 

a3 

1356 

a4 

1454 

a5 

3020 

— 

l 


Optimal  Strategy 


State 

Optimal  Action 

4> 

P 1+P  2+P  3+3P  4 

M1 

3Pi 

m2 

2P2+2P4 

T1 

P1+2P2+P4 

— 1 

ro 

2P2+2P4 

t3 

Pi+P4 

Ti.T2 

Pi+P2 

m2,  t2 

P2+P3+P4 

Optimal  Strategy:  Mapping  from  states  to  actions 


Robust  Organization  Design  1/2 


Object 


Asset  utilization  of  the 
v^iear-optimal  strategy 


tiol 
fiBBfivefall 


Mixed-integer 
optimization  algorithms 


Robust  organization 


nal  structure  in  terms  of  DM  ownership  of 
workload  is  minimized 


Internal  Workload 
External  Workload 

Workload  constraint 
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Integer  Optimization  Problem: 


Objective:  Minimize  overall  workload 
Subject  to: 

1)  Each  DM  cannot  exceed  his  workload  constraint 

2)  Each  platform  has  to  be  assigned  to  a  DM 

Workload  of  DMk  \  WLk  =Intemal  Workload  +  External  Workload 


m 


Internal  workload  oc  ^  {(platform  class  P.  activity) 


* 


i=l 


(number  of  platforms  of  platform  class  P  owned  by  DMk)} 


m  m 


External  workload  oc  {(platform  classes  P.  P-  cross  activity)* 

i= 1  .7=1 

(number  of  platforms  of  platform  class  P.  owned  by  DMk )  * 
(number  of  platforms  of  platform  class  P-  not  owned  by  DMk)} 
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Illustrative  Example  -  Revisit 


Statistics  of 
Platform  Utilization 


Mix-integer 
nonlinear 
programming 
algorithms 


Near-  Optimal  Strategy 

,  P2 
(5) 

P3 

(3) 

P4  ^ 

(2)  f 

1 

i 

1. 

08 

o.: 

33 

1 

.34 

1. 

5 

0. 

75 

o.: 

25 

1 

o 

o 

0/ 

75 

1.916 

o.: 

25 

1 

.58 

o.: 

25 

0.25 

0.: 

33 

0 

|.67 

i.( 

30 

1.58 

o.l 

37 

2 

.67 

Expected 
Platform  Utilization 


Expected  Platform 

Coordination 

V _ i 


p,+3P2+P3+p4 

DM, 

Pl+P2+P3+P  4 

dm9 

Pl+P2+P3 

DM, 

Robust  Organizational 
Structure 
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%  Proposed  a  methodology  for  designing  robust  organizations  for 
dynamic  and  stochastic  environments 

*  Modeled  the  mission  environment  as  a  finite  state  Markov 
Decision  Process 

h  Applied  Monte  Carlo  control  methods  to  obtain  a  near-optimal 
action  strategy 

H  Utilized  mixed-integer  optimization  technique  to  design 
organizational  structure  congruent  to  the  strategy 


Modeling  Parameters: 

♦  Incorporate  more  realistic  mission  environments  into  MDP 
model 

►  Task  locations 

►  Platform  locations,  velocities 

Space  Reduction  in  Learning: 

♦  Generalization  (Function  Approximation) 

♦  Abstraction  (Factored  Representation) 

Organizational  Design: 

+  Include  additional  organizational  structure  elements  into  the 
design  process 

►  Command  structure 

►  Information  flow  structure 


Thank  You 


